A Novel Approach to Morphological Analysis for Tamil Language
نویسندگان
چکیده
This paper presents the morphological analysis for complex agglutinative Tamil language using machine learning approach. Morphological analysis is concerned with retrieving the structure, syntactic rules, morphological properties and the meaning of a morphologically complex word. The morphological structure of an agglutinative language is unique and capturing its complexity in a machine analyzable and generatable format is a challenging job. Generally rule based approach is used in building morphological analyzer. In rule based approach what works in the forward direction may not work in the backward direction. The Novel approach to morphological analyzer is based on sequence labeling and training by kernel methods. It captures the non-linear relationships and various morphological features of Tamil language in a better and simpler way. The efficiency of our system is compared with the existing morphological analyzers which are available in net. Regarding the accuracy our system significantly outperforms the existing morphological analyzer and achieves a very competitive accuracy of 95.65% for Tamil language.
منابع مشابه
A Sequence Labeling Approach to Morphological Analyzer for Tamil Language
Morphological analysis is the basic process for any Natural Language Processing task. Morphology is the study of internal structure of the word. Morphological analysis retrieves the grammatical features and properties of a morphologically inflected word. Capturing the agglutinative structure of Tamil words by an automatic system is a challenging job. Generally rule based approaches are used for...
متن کاملStemmers for Tamil Language: Performance Analysis
Abstract— Stemming is the process of extracting root word from the given inflection word and also plays significant role in numerous application of Natural Language Processing (NLP). Tamil Language raises several challenges to NLP, since it has rich morphological patterns than other languages. The rule based approach light-stemmer is proposed in this paper, to find stem word for given inflectio...
متن کاملA Novel Data Driven Algorithm for Tamil Morphological Generator
Tamil is a morphologically rich language with agglutinative nature. Being agglutinative language most of the word features are postpositionally affixed to the root word. The morphological generator takes lemma, POS category and morpho-lexical description as input and gives a word-form as output. It is a reverse process of morphological analyzer. In any natural language generation system, morpho...
متن کاملTamil IT ! : Interactive Speech Translation in Tamil
The Tamil IT! (Interactive Translation) speech translation system is intended to allow unsophisticated users to communicate across the Tamil ↔ English language barrier, without strong domain restrictions, despite the error prone nature of current speech and translation technologies. Achieving this ambitious goal depends in large part on allowing the users to interactively correct recognition an...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کامل